Search CORE

8 research outputs found

Desarrollo y evaluación de herramientas para alineamiento automático de audio y texto con sistemas de reconocimiento automático del habla

Author: Gimeno Jordán Pablo
Olcoz Martínez Julia
Ortega Giménez Alfonso
Publication venue: 'Universidad de Zaragoza'
Publication date: 01/01/2016
Field of study

El objetivo del Reconocimiento Automático del Habla (RAH) es, dada una señal de voz, extraer la secuencia de palabras que han sido pronunciadas. Para poder llevar a cabo su tarea correctamente, un sistema de RAH precisa de ciertos conocimientos que obtiene a través de una fase de entrenamiento. Dicho aprendizaje se basa en dos modelos: el Modelo Acústico para caracterizar la señal de voz, y el Modelo de Lenguaje, relativo al vocabulario en ella utilizado. Este Trabajo Fin de Grado toma como punto de partida un motor de RAH para desarrollar y poner a prueba un sistema capaz de alinear el texto del guión de un programa de televisión con su correspondiente audio y obtener una localización temporal precisa de cada una de las palabras locutadas. Bajo esta premisa, se consideran diferentes estrategias de alineamiento. El principal problema que se nos plantea es la incertidumbre al localizar el texto en el audio, ya que, a priori no se tiene ninguna información. Como primera estrategia se propone, realizar un reparto uniforme del texto en el audio del programa. Así, se llevan a cabo una serie de experimentos que permiten caracterizar el sistema de alineamiento y obtener una primera referencia de sus prestaciones. Para disminuir la ambigüedad en la localización del texto en el audio se incluye un nuevo módulo en el sistema de alineamiento capaz de obtener marcas temporales parciales que sirvan de guía. Tras una nueva serie de experimentos se comprueba que esta estrategia supone una mejora relativa cercana al 12% respecto de las prestaciones ofrecidas por el sistema base. Demostrada la eficacia del uso de marcas temporales parciales, y en un intento por mejorar aun más el sistema de alineamiento, se utiliza una herramienta desarrollada para paliar las limitaciones del reconocedor en los finales de palabras, obteniendo una mejora relativa en torno al 20% respecto del sistema base, que alcanza valores próximos al 23% cuando se incluye la información de las intervenciones de cada locutor en el sistema de alineamiento. Por tanto, a la vista de las resultados obtenidos en este Trabajo Fin de Grado, se concluye que el uso de estrategias que permitan reducir la incertidumbre en la localización del texto en el audio resultan adecuadas en este contexto, quedando probada la mejora de prestaciones que suponen en el sistema de alineamiento

Repositorio Universidad de Zaragoza

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Author: A Abad
A Cardenal-Lopez
A Cardenal-López
A Jansen
A Jansen
A Martin
A Moreno
A Moreno
A Moreno-Sandoval
A Stolcke
Alejandro Coucheiro-Limeres
AM Azmi
Antonio Cardenal
Antonio Miguel
B Logan
B Logan
B Ma
B Taras
B Zhang
C Ni
C Parada
Carmen Garcia-Mateo
CJ Chen
D Can
D Karakos
D Povey
D Vergyri
D Vergyri
Doroteo T. Toledano
F Metze
F Metze
GJF Jones
H Joho
H Joho
H Su
H-Y Lee
H-Y Lee
HVD Heuvel
I Szöke
I Szöke
I-F Chen
I-F Chen
J Chiu
J Chiu
J Chiu
J Garofolo
J Li
J Mamou
J Mamou
J Pinto
J Tejedor
J Tejedor
J Trmal
J van Hout
Javier Tejedor
JG Fiscus
Julia Olcoz
Julian David Echeverry-Correa
K Iwata
K Thambiratmann
KM Knill
KM Knill
L Docío-Fernández
L Mangu
Laura Docio-Fernandez
LJ Rodríguez-Fuentes
M Bisani
M Cai
M Ma
M Saraclar
M Wollmer
M Zelenák
MJF Gales
MS Seigel
N Rajput
NF Chen
NF Chen
P Yu
Paula Lopez-Otero
R Justo
S Nakagawa
SP Rath
T Ng
T Ohno
T Sakai
V Mitra
V-B Le
X Anguera
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

Repositorio Universidad de Zaragoza

Biblos-e Archivo

Combining Multiple Approaches to Predict the Degree of Nativeness

Author: Abad Alberto
Batista Fernando
Ferreira Jaime
Moniz Helena
Olcoz Julia
Ribeiro Eugénio
Trancoso Isabel
Publication venue: Technische Universität Berlin
Publication date: 01/01/2015
Field of study

Automatic speaker nativeness assessment has multiple applications, such as second language learning and IVR systems. In this paper we view this as a regression problem, since the available labels are on a continuous scale. Multiple approaches were applied, such as phonotactic models, i-vectors, and goodness of pronunciation, covering both segmental and suprasegmental features. Different phonotactic models were adopted, either trained with the challenge data, or using additional multilingual data from other domains. The obtained values were later combined in multiple ways and fed to a support vector machine regressor. Results on the test set surpass the provided baseline and are in line with the results obtained on the remaining sets. This suggests that our models generalize well to other datasetsinfo:eu-repo/semantics/publishedVersio

Universidade de Lisboa: Repositório.UL

ALBAYZIN 2016 spoken term detection evaluation: an international open competitive evaluation in Spanish

Author: Coucheiro Limeres Alejandro
Docío Fernández Laura
Ferreirós Javier
Hernáez Inma
Llombart Jorge
López Otero Paula
Olcoz Julia
Serrano Luis
Tejedor Noguerales Javier
Toledano Doroteo T
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/03/2022
Field of study

Within search-on-speech, Spoken Term Detection (STD) aims to retrieve data from a speech repository given a textual representation of a search term. This paper presents an international open evaluation for search-on-speech based on STD in Spanish and an analysis of the results. The evaluation has been designed carefully so that several analyses of the main results can be carried out. The evaluation consists in retrieving the speech files that contain the search terms, providing their start and end times, and a score value that reflects the confidence given to the detection. Two different Spanish speech databases have been employed in the evaluation: MAVIR database, which comprises a set of talks from workshops, and EPIC database, which comprises a set of European Parliament sessions in Spanish. We present the evaluation itself, both databases, the evaluation metric, the systems submitted to the evaluation, the results, and a detailed discussion. Five different research groups took part in the evaluation, and ten different systems were submitted in total. We compare the systems submitted to the evaluation and make a deep analysis based on some search term properties (term length, within-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and native (Spanish)/foreign terms)Xunta de Galicia | Ref. ED431G/01Ministerio de Economía y Competitividad | Ref. TEC2015-67163-C2-1-RMinisterio de Economía y Competitividad | Ref. TIN2014-54288-C4-1-RMinisterio de Economía y Competitividad | Ref. TEC2015-68172-C2-1-

Investigo

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Author: Cardenal López Antonio José
Coucheiro Limeres Alejandro
Docío Fernández Laura
Echeverry Correa Julian David
García Mateo Carmen
López Otero Paula
Miguel Artiaga Antonio
Olcoz Julia
Tejedor Noguerales Javier
Toledano Doroteo T
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/03/2022
Field of study

Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).Ministerio de Economía y Competitividad | Ref. TEC2012-37585-C02-01Xunta de Galicia | Ref. 2014/02

Investigo

ALBAYZIN 2016 spoken term detection evaluation: an international open competitive evaluation in Spanish

Author: Alejandro Coucheiro-Limeres
Doroteo T. Toledano
Inma Hernaez
Javier Ferreiros
Javier Tejedor
Jorge Llombart
Julia Olcoz
Laura Docio-Fernandez
Luis Serrano
Paula Lopez-Otero
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Abstract Within search-on-speech, Spoken Term Detection (STD) aims to retrieve data from a speech repository given a textual representation of a search term. This paper presents an international open evaluation for search-on-speech based on STD in Spanish and an analysis of the results. The evaluation has been designed carefully so that several analyses of the main results can be carried out. The evaluation consists in retrieving the speech files that contain the search terms, providing their start and end times, and a score value that reflects the confidence given to the detection. Two different Spanish speech databases have been employed in the evaluation: MAVIR database, which comprises a set of talks from workshops, and EPIC database, which comprises a set of European Parliament sessions in Spanish. We present the evaluation itself, both databases, the evaluation metric, the systems submitted to the evaluation, the results, and a detailed discussion. Five different research groups took part in the evaluation, and ten different systems were submitted in total. We compare the systems submitted to the evaluation and make a deep analysis based on some search term properties (term length, within-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and native (Spanish)/foreign terms)

Repositorio Universidad de Zaragoza

Directory of Open Access Journals

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Author: A Abad
A Cardenal-Lopez
A Cardenal-López
A Jansen
A Jansen
A Martin
A Moreno
A Moreno
A Moreno-Sandoval
A Stolcke
Alejandro Coucheiro-Limeres
AM Azmi
Antonio Cardenal
Antonio Miguel
B Logan
B Logan
B Ma
B Taras
B Zhang
C Ni
C Parada
Carmen Garcia-Mateo
CJ Chen
D Can
D Karakos
D Povey
D Vergyri
D Vergyri
Doroteo T. Toledano
F Metze
F Metze
GJF Jones
H Joho
H Joho
H Su
H-Y Lee
H-Y Lee
HVD Heuvel
I Szöke
I Szöke
I-F Chen
I-F Chen
J Chiu
J Chiu
J Chiu
J Garofolo
J Li
J Mamou
J Mamou
J Pinto
J Tejedor
J Tejedor
J Trmal
J van Hout
Javier Tejedor
JG Fiscus
Julia Olcoz
Julian David Echeverry-Correa
K Iwata
K Thambiratmann
KM Knill
KM Knill
L Docío-Fernández
L Mangu
Laura Docio-Fernandez
LJ Rodríguez-Fuentes
M Bisani
M Cai
M Ma
M Saraclar
M Wollmer
M Zelenák
MJF Gales
MS Seigel
N Rajput
NF Chen
NF Chen
P Yu
Paula Lopez-Otero
R Justo
S Nakagawa
SP Rath
T Ng
T Ohno
T Sakai
V Mitra
V-B Le
X Anguera
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

ALBAYZIN 2016 spoken term detection evaluation: an international open competitive evaluation in Spanish

Author: A Jansen
A Martin
A Moreno
A Moreno
A Stolcke
A Yazgan
Alejandro Coucheiro-Limeres
B Logan
B Logan
B Ma
B Taras
C Ni
C Parada
C Parada
C-A Chan
D Can
D Can
D Castán
D Karakos
D Mostefa
D Povey
D Povey
D Schneider
D Vergyri
D Vergyri
D Wang
D Xu
DA James
Doroteo T. Toledano
DRH Miller
F Casacuberta
F Méndez
F Seide
G Chen
G Chen
GJF Jones
H Cuayahuitl
H Li
H Su
H Wang
H-Y Lee
HV Heuvel den
I Sainz
I Szoke
I Szoke
I Szoke
I Szoke
Inma Hernaez
J Billa
J Chiu
J Li
J Mamou
J Mamou
J Pinto
J Tejedor
J Trmal
Javier Ferreiros
Javier Tejedor
JG Fiscus
Jorge Llombart
Julia Olcoz
K Thambiratmann
K Vesely
K-F Lee
L Burget
L Mangu
L Zhang
Laura Docio-Fernandez
LJ Rodríguez-Fuentes
Luis Serrano
M Cai
M Killer
M Larson
M Saraclar
M Wollmer
M Zelenak
N Kanda
NF Chen
NF Chen
P Ghahremani
P Koehn
P Motlicek
P Yu
Paula Lopez-Otero
R Justo
R Wallace
R Wallace
S Lee
S Meng
S Nakagawa
S Parlak
T Akiba
T Akiba
T Akiba
T Akiba
T Ohno
VT Pham
W Hartmann
X Zhang
Z Lv
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref